Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: export total GC Assist ns in MemStats and GCStats #55159

Conversation

nvanbenschoten
Copy link
Contributor

At a high level, the runtime garbage collector can impact user goroutine latency in two ways. The first is that it pauses all goroutines during its stop-the-world sweep termination and mark termination phases. The second is that it backpressures memory allocations by instructing user goroutines to assist with scanning and marking in response to a high allocation rate.

There is plenty of observability into the first of these sources of user-visible latency. There is significantly less observability into the second. As a result, it is often more difficult to diagnose latency problems due to over-assist (e.g. #14812, #27732, #40225). To this point, the ways to determine that GC assist was a problem were to use execution tracing or to use GODEBUG=gctrace=1 tracing, neither of which is easy to access programmatically in a running system. CPU profiles also give some insight, but are rarely as instructive as one might expect because heavy GC assist time is scattered across a profile. Notice even in https://tip.golang.org/doc/gc-guide, the guidance on recognizing and remedying performance problems due to GC assist is sparse.

This commit adds a counter to the MemStats and GCStats structs called AssistTotalNs, which tracks the cumulative nanoseconds in GC assist since the program started. This provides a new form of observability into GC assist delays, and one that can be manipulated programmatically.

There's more work to be done in this area. This feels like a reasonable first step.

At a high level, the runtime garbage collector can impact user goroutine
latency in two ways. The first is that it pauses all goroutines during
its stop-the-world sweep termination and mark termination phases. The
second is that it backpressures memory allocations by instructing user
goroutines to assist with scanning and marking in response to a high
allocation rate.

There is plenty of observability into the first of these sources of
user-visible latency. There is significantly less observability into the
second. As a result, it is often more difficult to diagnose latency
problems due to over-assist (e.g. golang#14812, golang#27732, golang#40225). To this
point, the ways to determine that GC assist was a problem were to use
execution tracing or to use GODEBUG=gctrace=1 tracing, neither of which
is easy to access programmatically in a running system. CPU profiles
also give some insight, but are rarely as instructive as one might
expect because heavy GC assist time is scattered across a profile.
Notice even in https://tip.golang.org/doc/gc-guide, the guidance on
recognizing and remedying performance problems due to GC assist is
sparse.

This commit adds a counter to the MemStats and GCStats structs called
AssistTotalNs, which tracks the cumulative nanoseconds in GC assist
since the program started. This provides a new form of observability
into GC assist delays, and one that can be manipulated programmatically.

There's more work to be done in this area. This feels like a reasonable
first step.
@gopherbot
Copy link
Contributor

This PR (HEAD: 0e70f5d) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/431877 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Michael Knyszek:

Patch Set 1: Hold+1

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/431877.
After addressing review feedback, remember to publish your drafts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants